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This paper generalizes a previously published differential equation that de- 
scribes the relation between the age-specific incidence, remission, and mor- 
tality of a disease with its prevalence. The underlying model is a simple 
compartment model with three states (illness-death model). In contrast to 
the former work, migration- and calendar time-effects are included. As an ap- 
plication of the theoretical findings, a hypothetical example of an irreversible 
disease is treated. 
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1 Introduction 

With a view to basic epidemiological parameters such as incidence, prevalence and mor- 
tality of a disease, it has been proven useful to consider simple illness-death models [8] 
as shown in Figured) Depending on the context, sometimes these are referred to as state 
models or compartment models. Here we consider three states: Normal or non-diseased 
with number of people denoted as S (susceptible), the diseased state with number C 
(cases) and the death state. 

The transition intensities between the states henceforth are denoted with the symbols 
as in Figure [1] incidence i, remission r and mortality rate^] mo and m\. In general, the 
intensities depend on calendar time t, age a and sometimes also on the duration d of the 
disease. 



The expressions rate and density are synonymously used in this article. 
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Figure 1: Simple illness-death model 

Models of this kind are quite common, see for example [8], [9] or the text book [7]. 
Murray and Lopez ([13] and [H]) have considered such a compartment model with rates 
being independent from calendar time t and duration d. In the context of the Global 
Burden of Disease study of the World Health Organization they used following system 
of ordinary differential equations (ODEs) to describe the transitions between the three 
states: 

^— = — (i + m ) ■ S + r ■ C 

dC (1) 

— — = i ■ S — (mi + r) ■ C. 
da 

By this system the changes in the numbers of the non-diseased and diseased persons 
aged a are related to the intensities as in Figure [TJ Age plays here the role of temporal 
progression. This homogeneous linear system of ODEs looks relatively harmless, but is 
limited due to its heavy assumptions. By an easy calculation it can be shown that Eq. 
(JTJ implies the population being stationary. Let N(a) := S(a) + C(a) denote the total 
number of persons alive in the population aged a. For a e [0, a;] with N(a) > define 
the age-specific prevalence 

P(a) ■= r ,P[ a L v (2) 
C(a) + b(a) 

Then from Eq. ([1]) it follows 

diV _ dS_ dC 
da da da 

= —m ■ S — mi ■ C 

= —N ■ [(1 — p) ■ m + p ■ nil] ■ 

The term (1—p) -niQ+p-nii is the overall mortality m in the population. Hence, it holds 
^jj = — m ■ N, which is the defining equation of a stationary population, [16]. Although 
the model of a stationary population is widely used in demography, real populations 
merely are stationary. Moreover, the inclusion of the values S and C is disturbing. It 
would be better if Eq. (pD) could be expressed in terms of the age-specific prevalence 
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(J2J), what indeed can be achieved. In p.] it has been shown, that system <^ can be 
transformed into the following one-dimensional ODE of Riccati type: 



dp 
da 



(1 - p) ■ (i - p ■ (mi - mo)) 



— r ■ p. 



(3) 



The importance of Eq. ([T]) and ([3]) is obvious. For given incidence-, remission- and 
mortality-rates plus an initial condition, the age profile of the numbers of patients and 
the prevalence is uniquely determined, respectively. To state it clearly, the "forces" inci- 
dence, remission and mortality uniquely prescribe the prevalence - not only qualitatively 
but in these quantitative terms. This is called the forward problem: we infer from the 
causes - the forces - to the effect - the numbers of diseased or the prevalence, respec- 
tively. If in the scalar Riccati ODE (J3j) the age-profiles of the prevalence, mortality and 
remission are known, one can directly solve Eq. ([3]) for the incidence. This is the inverse 
problem - we conclude from the effect to the cause. This allows, for example, cross- 
sectional studies being used for incidence estimates, where otherwise lengthy follow-up 
studies are needed. For an example on real data, see [1J. Recently, it has been proven 
that the inverse problem is ill-posed [2]. 

The article is organized as follows: In the next section Eq. is generalized allowing 
dependency on calendar time and migration. The central result is a partial differential 
equation (PDE). Similar to the ODE, in the general case there is a forward and an 
inverse problem for the PDE, too. These are analyzed in a simulated register data of a 
hypothetical chronic disease in the section thereafter. Finally, the results are summed 
up. 

2 General equation of disease dynamics 

In this section the simple illness-death model of Figure Q] is generalized. The rates 
i,r,m and mi henceforth depend on age a and calendar time t, but are assumed to 
be independent from the duration d. Furthermore, let the numbers of the non-diseased 
S(t,a) and diseased persons C(t,a) aged a at time t be non-negative and partially 
differentiable. Define N(t,a) := S(t,a) + C(t,a). Additionally, let a(t,a) and 7(£, a) 
denote those proportions of N(t, a), such that a(t, a) ■ N(t, a) and j(t, a) ■ N(t, a) are the 
net migration rates of non-diseased and diseased persons aged a at time t, respectively: 




(4) 



After introducing the age-specific prevalence p(t, a) in year t, 



for (t,a) e D := {(t,a) G [0,oo) 2 \C(t,a) > 0, S(t,a) > 0,C(t,a) + S(t,a) > 0} the 
system (j3j) can be transformed into an equation similar to (j3J): 

Theorem 2.1. Let S(t,a) and C(t,a) be given by Eq. (j3J), then p(t,a) is partially 
differentiable in D and it holds 



d d 

0^ + -Q t ) P = i 1 ~ P) [* - M m i - m o)] -rp + n, 



(5) 



where /i := 7(1 — p) — pa describes the impact of migration. 

Proof. Follows from applying the quotient rule to p(t, a) = a) anc ^ us ^ n S &■ D 

Obviously, if the incidence- and mortality rates do not depend on the calendar time 
t, then from Eq. fl5]) with [i = it follows ([3]). Hence, Eq. fl3]) does not depend on the 
stationary population assumption. 

For applications in epidemiology it is important that solutions of Eq. (jSJ) are mean- 
ingful, i.e. p(t, a) G [0, 1] for all (t, a) G D. Therefor we note: 

Theorem 2.2. For all (t, a) G D following statements are equivalent: 
(1) p(t,a) = s{t C a )+c^a) is a solution of Eq. ©. 



(2) S(t,a) 



p(t,a)) ■ N(t,a) and C(t,a) = p(t,a) ■ N(t,a) are solutions to Eq. 



Proof. This follows by inserting the expressions into the PDEs. 



□ 



By Theorem 12.21 a solution p(t, a) of Eq. ([5]) can be written as p(t, a) = Z^' a l with 



N(t, a) = S(t, a) + C(t, a). For (t, a) G D this implies p(t, a) G [0, 1]. 



N(t,a) 



The migration term fi will be analyzed further now. Let ip := a + 7 be the overall 
migration rate. We split all migration rates /, / G {<p,cr, 7} into a positive part / + > 
(immigration) and a negative part /_ > (emigration): 

f=f+~f-, for / G {(p, a, 7}. 

Moreover, for tp_(t,a) > define p_(t,a) := the prevalence of the disease in 

the emigrants and for ip + (t,a) > define p^\t,a) := the prevalence in the 

immigrants. 

Proposition 2.1. With the notations as above it holds 

<p+(t,a) -p { ^\t,a) 
-<P-(t, a ) ■ P-(t, a) ~ <f(t, a) ■ p(t, a), for <p_(t, a), <p+(t, a) > 0; 



fi(t,a) 



(p+(t,a) ■ p { +\t,a) -p(t,a) 



-<p-(t,a) ■ 
0, 



p (m \t,a) -p(t,a) 



for <p_(t, a) = 0, <p+(t, a) > 0; 

for tp_(t, a) > 0, <p+(t, a) = 0; 
for <p-.(t,a) = <p+(t,a) = 0. 
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Proof. For all (t, a) G D it holds /i = 7 — <£> • p. By splitting this expression into positive 
and negative parts, the Proposition follows. □ 

With the assumption that the prevalence of those aged a at time t who immigrate is 
the same of those who emigrate, say p^ m \t, a), then it holds 

p = tp (p (m) - p) . (6) 

Hence, if the prevalence p( m > of the migrants is the same as of those who stay, p( m ' = p, 
the change in prevalence (J^ + J^)p does not depend on migration. 

This is an important result, because in illness-death models the assumption of absence 
of migration is often made. In our framework this restriction is not necessary. Even if 
there is migration, but the prevalence in the migrants is the same as in the resident 
population, then the prevalence is not affected by migration. 

The solution of Eq. (J5J) can be obtained by the methods of characteristics [T5]. Let an 
initial condition of the form p(a,0) = po{a) be given, then we have a so called Cauchy 
problem, which has a unique solution if the right-hand side of the PDE is sufficiently 
smooth [15J. This solution is calculated as follows. Assume, the prevalence for those 



aged a in year t has to be calculated. First, rearrange (J| + J^)p such that 

Ea + P = a(ytl ^ + ^ P + °^ 
Second, solve the initial value problem given by following Riccati ODE: 
dy(r) 



dr 



a(r + a , r) + (3(t + a , r) y + j(t + a ,r) y , (7) 



and initial value y(0) = po(ao) where ao := a — t. Then, an easy calculation shows that 
y{t) = p(t, a) is the desired value. 



3 Application on a simulated register 

In this section, the application of the above-formulated Cauchy problem on a simulated 
register of a chronic disease is shown. Since the disease is assumed to be irreversible, 
it holds r = 0. First, we address a direct problem: From given age-specific prevalence 
in some point in time to we want to deduce the age-specific prevalence in t%, t\ > to, 
by applying Eq. (J^J) with p, = 0. Second, an inverse problem is formulated. Assume in 
the year £ the functions po = p(t , •), z(t , •)> m o(^o, •) and mi(t , •) were measured. If 
in year ti, t\ > t , the age profile of the prevalence p(ti, •) is given, the question arises: 
how has the course of the age-specific incidence changed in the meantime? This is an 
inverse problem, because we infer from the effect (prevalence in ti) on the causes. Here 
we will formulate a simple, straight forward solution by an optimization approach. 

Both problems will be treated based on data of a simulated register. The register is 
designed such that in a period of 150 years all persons are tracked from birth to death. 
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For each person, the date of an eventual diagnosis of the chronic disease is recorded. For 
the simulation, the following assumptions are placed as a basis: 

1. In each calendar year to 150 2,000 people are born. The births during the 
calendar year follow a uniform distribution. 

2. The mortality of the non-diseased persons is of Strehler-Mildvan type and is given 
by the equation 



m (t, a) = exp(-10.7 + 0.1a) ■ (1 - 0.002) 



(t-20) H 



The notation (t — 20) + denotes the positive component of the expression (t — 20). 
The exponential term approximates the current mortality of men in Germany, the 
second factor takes the increasing life expectancy into account. 

3. The incidence is described by the equation 

i(t,a)= (Q ~ 30)+ -0.99( t - 50 K (8) 
v ' ' 3000 v ' 

4. The relative risk of death is constant for all ages and times: 

R(t,a)= m f a) =2. 
m {t, a) 



After the simulation, each person in the register is represented by four pieces of infor- 
mation: 

1) A unique identification number (an integer), 

2) Calendar year of birth, 

3) The person's age in years at diagnosis (0 if the person does not fall ill), 

4) Age of death of the person in years. 

Entries 2) - 4) in the register are given to three decimals, which corresponds to a 
precision of one day. The identification number of the person is an ongoing counter. 
The date of birth (in calendar years) is given by the simulated year, the decimals are 
drawn from a uniform distribution [0,1). To decide if a thus far non-diseased person 
born in year r becomes ill or dies without the disease, a competing risk approach in a 
discrete event simulation (DES) is accomplished. Based on the cumulative distribution 
function of the common risk (total intensity i{r + a, a) + m (r + a, a)), the age ao of 
event is drawn by the inverse transform sampling (inversion method, [5]). Based on a 
comparison between i(r + a , a ) and m (T + a , a ), it is decided whether the onset of 
the disease or the death without disease occurred. In the first case a represents the 
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age at disease's onset, in the second case, ao is the age of death. If the person gets the 
disease, the age of death is simulated (conditional on reaching the age a ). 

As in the calendar years to 150 exactly 2000 people are born every year, the hy- 
pothetical register contains 151 ■ 2000 = 302, 000 persons. Then, the events of the 
hypothetical register are transformed into a Lexis diagram of five years intervals [TO] . 
This allows an easy extraction of the person-years and the numbers of events in the 
corresponding age- and period classes. 

In both test cases, in the direct and the inverse problem, we assume information to 
be given only in two points in time, to and t%. Of course, three or more points in time 
would be advantageous, but with respect to applicability in epidemiological contexts, 
the test problems try to mimic a minimalistic setting. 

3.1 Direct problem 

Assume we have measured the age profile po = p(to, ■) of the prevalence in to, and the 
age-specific incidence i(to,-) and mortality densities m (£o, •) and mx(to, •). Furthermore, 
at a later point in time t\ > to let the age-specific rates i(ti,-) and mortalities m (£i, •) 
and mi(ti,-) be given. The direct problem refers to the question: what can be said 
about the age-specific prevalence p(ti, ■) in t x ? 

To answer this question, age-specific incidence and mortality rates at two time points 
to = 120 and t± = 140 (years) are extracted from the register. In addition, the age- 
specific prevalence p(to, •) is collected at t . Figure [2] shows the extracted age- specific 
incidence density (dashed lines) in t (red) and t\ (blue) in comparison with the theo- 
retical values (solid lines). 

Now consider the Cauchy problem that is given by Eq. with the initial condition 
p(to, •) = po- For the solution one needs the functions i(t, ■), mo(t, ■) and mi(t, ■) for all 
time points t between to and t\. For this, the function values are interpolated affine- 
linearly. The initial value problem Eq. (J7J) is solved numerically using the MATLAB0 
function ode45. 

If we compare the numerical solution of the Cauchy problem in year t\ = 140 with 
the actually observed prevalence in the year 140, one gets the result as shown in Figure 

m 

Visually this gives a fairly good agreement between the predicted curve with the 
actually observed age-specific prevalence. The maximum absolute deviation is 0.0146, 
which means that in this example the prevalence can be predicted up to 1.5 percent 
points. The largest deviation is in the oldest age class, when we have only a few cases 
of the disease. 

3.2 Inverse problem 

In epidemiological studies, it is more laborious to measure incidence rates than preva- 
lences. Hence, in practice, the following inverse problem is much more important than 

2 The MathWorks, Natick, Massachusetts, USA 
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Figure 2: Age-specific incidence density extracted from the register (dashed lines) at 
to = 120 (red) and t\ = 140 (blue) in comparison with the theoretical values 
(solid lines). 



the direct problem of the previous section. Assume in the year t = 120 the functions 
Po — p(^0)"); i(to,-), m (t ,-) and mi(t ,-) are known. Moreover, in year t\ = 140 let 
the age profile of the prevalence p(ti, •) be given. The functions m (£i, •) and mi(ti, ■) 
are also assumed to be known (for example from other epidemiological studies). The 
question then is, how well the incidence i(t\, •) in the year t\ can be derived from this 
information. For simplicity, we assume that the incidence of i(t\, ■) in t\ can be expressed 
product 



i(t 1 ,-)=i(t ,-)-(l-h), (9) 

where h E [0,1]. The upper limit for h stems from the fact, that incidence rates are non- 
negative. The lower limit reflects the prior knowledge, that incidence has not increased 
in t\ compared to t : i(t±, a) < i(t , a), for all a G [0, oo). Equation (jUj) corresponds to a 
proportional hazards approach, which is used widely in epidemiology. 

To solve this inverse problem, we formulate an optimization problem. For given 
h E [0, 1] and i(t , •) by Eq. ([9]) the function i(ti, •) is defined. If furthermore p(to, •), 
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Figure 3: Numerical solution of the direct problem (dashed line) compared to the ob- 
served prevalence in year t\ = 140 (solid line). 

mo(to,-), rno(ti, ■), mi (to, •) and mi(ti, •) are known, then we are in the situation to 
calculate a unique function ph{t\, •) by solving the Cauchy problem described in the pre- 
vious subsection 13.11 The solution Ph(ti, •) of the direct problem depends on h. We can 
compare ph(ti, •) with the measured prevalence p(h, •) in the register. Thus, we seek for 
h* G [0, 1] that minimizes the Euclidean distance between ph(ti, •) and p(ti, •) : 



where A tl = {a G [0, oo) \(h,a) G D}. 

Figure H] shows the Euclidean distance between the prevalence p(ti, ■) in the register 
and the solution phih, ■) as a function of h. 

From the graph in Figure H] it is obvious that the square of the distance is minimized 
at about h* = 0.25. Since from the 50th calendar year the incidence decreases by 1% 
per year and a period of 20 years was considered, a factor 1 — h of about 0.99 20 = 0.82 = 
(1 — 0.18) is expected. The revealed value h* = 0.25 is about a factor of 1.4 too large. 



In this work we developed a new equation linking incidence-, remission- and mortality- 
rates with prevalence of a disease. In contrast to former works, the assumptions of 




(10) 



4 Discussion 
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Figure 4: Euclidean distance (as in Eq. ( TTUj) ) as a function of h. There is a unique 
minimum h* = 0.25. 

stationary populations, independence from calendar time and zero net migration have 
been released. The new equation has a wide range of applicability in epidemiological, 
health care and health economic contexts. 

However, it has several limitations. First, Eq. ([5]) needs the remission rate r and 
mortality rate m\ of the diseased to be independent from the duration d of the disease. In 
real diseases independence from duration is only an approximation. For many infectious 
diseases, immune response is dependent on the time since onset of the disease. Also 
in chronic diseases duration since onset plays a major role. For example, the age- and 
sex-adjusted mortality due to coronary heart disease roughly doubles for each 10-year 
increase in diabetes duration. The all-cause mortality increases by a factor of 1.2 per 
10-year duration, [I]. 

Second, although the new equation is not limited to the case \i = 0, in practical appli- 
cations information about the health of immigrants and emigrants is seldom obtainable. 
By Proposition 12.11 reasonable knowledge of prevalence in all migrants is necessary to 
accurately treat the case // ^ 0. To give an example, countries with large-scale im- 
migration programs such as Canada observe a so-called healthy immigrant effect with 
respect to chronic diseases: immigrants are healthier than residents, [11]. Assumed that 
the emigrants from Canada have the same prevalence as the residents, it would follow 
fi ytz o. However, surveys about the health status of emigrants are missing. The reason is 
obvious, Canada's taxpayer-funded health care system is interested in measuring health 
of those who immigrate, but not in those who emigrate. Hence, information is lacking 
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and assumptions have to be made. 

Third, Eq. (jSJ) only considers prevalence in migrants at the moment of emigration or 
immigration. Of course, large scale immigration is likely to change the incidence of the 
disease in the population, because immigrants' health adapts to the new environment. 
There are many examples where immigrants from the developing countries increase in- 
cidence of diabetes and related complications when adopting westernized lifestyle, [12] . 
The opposite may also be true, in Canada immigrants continue to have a lower relative 
risk of chronic conditions compared to the native-born, even many years after immigra- 
tion, P]. 

Beside theoretical considerations, we use the new equation in a simulated register of a 
hypothetical chronic disease. The register has been simulated by Monte Carlo techniques 
and has been analyzed by a numerical implementation of the new equation. To check 
the practical applicability of the analysis, the simulation and the analysis have been 
strictly separated, i.e. neither was the PDE used in simulating the register, nor was 
information other than explicitly mentioned, used as input for the simulation exploited 
in the analysis. The PDE has only been used in the analysis of the direct and inverse 
problem. In the direct problem, the prevalence at the later point in time t\ could be 
predicted from the prevalence in to twenty years earlier with a high accuracy. Of course, 
the obtained accuracy is a result of the structure inherent to the simulation. The solution 
of both, direct and inverse problem, uses an affine-linear interpolation for the incidence- 
and mortality rates between to and t\. In the simulated register this works well, because 
it reflects the trends in the incidence and mortalities. Affine-linear interpolation will 
impose problems if the incidence trend turns around between to and t\. An example for 
a change of trends can be found in [3] : from 1995 to 2004 incidence of diabetes is found 
to be rising with an average of 5.3% per year in all age classes, and from 2005 to 2007 
incidence is declining with 3.1% per year. 

In the inverse problem, the incidence in t± was reconstructed from the observed preva- 
lence in t\. Provided that the right-hand side of Eq. (J5]) is sufficiently smooth, existence 
of h* G [0, 1] follows from the continuous dependency of the solution of the Cauchy 
problem on h from the compact interval [0, 1]. Continuous dependency can be seen 
by noting that the solution constructed by the methods of characteristics inherits its 
smoothness properties from the smoothness of the right-hand side of Eq. (JTj), [6]. The 
question remains why the result (in terms of h*) is about a factor 1.4 too large. The 
approach in solving the inverse problem is the proportional hazards assumption Eq. (Q. 
Indeed, the simulation considers a decline of exponential type, see Eq. (jSj). Although 
the exponential in this case can approximated by an affine-linear interpolation function 
quite well, it appears that the solution of the inverse problem reacts quite sensitively on 
inaccuracies. This is in line with our observation, that the inverse problem is ill-posed 
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